dnadna.utils.config

Config file, serialization, and schema handling.

Functions

load_dict(filename, **kwargs)

Loads a nested JSON-like data structure from a given filename.

load_dict_from_json(filepath)

Load a JSON file as a dict.

save_dict(obj, filename, **kwargs)

Serializes a nested JSON-like data structure to a given filename.

save_dict_annotated(obj, filename[, schema, ...])

Serializes a (possibly nested) dict to YAML, after (optionally) validating it against the given schema, and producing comments from the title/description keywords in the schema.

save_dict_in_json(filepath, params)

Save a dictionary into a json file.

Classes

Config(*args[, overrides, validate, schema, ...])

Represents the configuration for one of DNADNA's components, such as simulation and training configuration.

ConfigMixIn([config, validate])

Mix-in for classes that accept a Config object to provide part of their attribute namespace.

ConfigValidator(schema, *args[, ...])

A custom validator wrapping jsonschema.Draft7Validator class which supports special validation functionality for DNADNA Config objects.

DeepChainMap(*maps[, overrides])

Like collections.ChainMap, but also automatically applies chaining recursively to nested dictionaries.

Exceptions

ConfigError(config, msg[, suffix, path])

class dnadna.utils.config.Config(*args, overrides=[], validate=False, schema=None, filename=None, resolve_inherits=False, resolve_overrides=True)[source]

Bases: DeepChainMap

Represents the configuration for one of DNADNA’s components, such as simulation and training configuration.

This is a specialized subclass of DeepChainMap, with extra bells and whistles, in particular validating the configuration against a JSON Schema using a specialized schema validator with enhanced functionality over the default jsonschema.validate functionality. See ConfigValidator for examples of the extra functionality provided by the custom schema validator used by Config.

Another special feature of Config is to link multiple config files together with a special “inherit” property: If the value of a keyword in the config is a dict containing the “inherit” key, the value of that dict is loaded directly from the file pointed to by “inherit”. Any additional keys in the dict containing “inherit” override/extend the dict loaded from the inherit. See the Examples section below for explicit examples.

Parameters:

*args – One or more dict or other mapping types from which to instantiate the Config. In normal usage only one dict should be passed. The support for multiple positional arguments is in order to supported the underlying DeepChainMap functionality. The reason DeepChainMap is used is to support the “inherit” functionality. Each inherited Config is added to the tree of DeepChainMap.maps.

Keyword Arguments:
  • overrides (list) – (optional) – Same as in DeepChainMap.

  • validate (bool or dict) – (optional) – Validate the given config. This checks two things: It validates that all inherits were successfully resolved. If a schema was specified, it then also validates the config against that schema (default: False). If a non-empty dict is given instead, validation is enabled, and the dict is passed as keyword arguments to the ConfigValidator class to control its behavior. This is used primarily for implementation purposes.

  • schema (str or dict) – (optional) – The JSON Schema against which to validate the config. This can either be the name of one of the built-in schemas (see Config.schemas) or it can be a full JSON Schema object represented as a dict.

  • filename (str or pathlib.Path) – (optional) – If the Config was read from a file (e.g. as with Config.from_file) this argument can be used to store the name of the file the config was read from. This should normally not be used directly, as it is normally set when using Config.from_file

  • resolve_inherits (bool or dict) – (optional) – If True, make sure all “inherit” keywords in the given config are resolved to their true values, by loading the inherited config files. If resolve_inherits is a non-empty dict this has the same effect is True, except the dict is passed as keyword arguments to the load_dict call that is used to read inherited config files. This argument is primarily for internal use and testing, and should not be used directly without knowing what you’re doing.

Examples

>>> from dnadna.utils.config import Config

Under the simplest usage, a Config is just a simple wrapper for a dict:

>>> config = Config({'a': 1, 'b': 'c'})
>>> config['a']
1
>>> config['b']
'c'

However, when a schema is provided and validate=True, the wrapped dict is validated against that schema. If validation succeeds, then the Config object is quietly instantiated with success:

>>> schema = {'properties': {'a': {'type': 'integer'}}}
>>> config = Config({'a': 1, 'b': 'c'}, validate=True, schema=schema)

But when validation fails, instantiating the Config will fail with a jsonschema.ValidationError exception:

>>> config = Config({'a': 'b', 'c': 'd'}, validate=True, schema=schema)
Traceback (most recent call last):
...
dnadna.utils.config.ConfigError: error in config at 'a': 'b' is not of type 'integer'

Validation can also be delayed. If a schema was provided but validate=False, a later call to Config.validate will validate the instantiated Config against that schema:

>>> config = Config({'a': 'b', 'c': 'd'}, schema=schema)
>>> config['a']  # successfully created despite violating the schema
'b'
>>> config.validate()  # validates against the previously provided schema
Traceback (most recent call last):
...
dnadna.utils.config.ConfigError: error in config at 'a': 'b' is not of
type 'integer'

It is also possible to validate against one of the many built-in schemas given by:

>>> sorted(Config.schemas)
['dataset', 'dataset_formats/dnadna', 'definitions', 'nets/...', ...,
'param-set', ..., 'training', 'training-run']

For example:

>>> from dnadna.examples.one_event import DEFAULT_ONE_EVENT_CONFIG
>>> config = Config(DEFAULT_ONE_EVENT_CONFIG.copy(), schema='simulation')
>>> config.validate() is None
True

You can also view the full values of these schemas, like:

>>> Config.schemas['simulation']
{'$schema': 'http://json-schema.org/draft-07/schema#',
'$id': 'py-pkgdata:dnadna.schemas/simulation.yml',
'type': 'object',
'description': 'JSON Schema (YAML-formatted) for basic properties of a
simulation...',
...}

Now we discuss inherits, which is a slightly complicated subject. The most basic usage is having a key in the config dictionary like "key": {"inherit": "/path/to/inherited/file"}. In this case the value associated with "key" is replaced with the contents of the inherited file:

>>> from dnadna.utils.config import save_dict
>>> tmp = getfixture('tmp_path')  # pytest specific
>>> inherited = tmp / 'inherited.json'
>>> save_dict({'foo': 'bar', 'baz': 'qux'}, inherited)
>>> d = {'key': {'inherit': str(inherited)}}

In the original dict, the value of the 'key' key is just as we specified:

>>> d['key']
{'inherit': '...inherited.json'}

But when we instantiate Config from this, the value for 'key' will be transparently replaced with the contents of inherited.json:

>>> config = Config(d, resolve_inherits=True)
>>> config['key']
Config({'foo': 'bar', 'baz': 'qux'})

Inherits can also be nested, so if file A inherits from file B, and file B also contains inherits, the inherits in file B are resolved first, and so on. Demonstrating this is left as an exercise to the reader.

If a dict contains the 'inherit' keyword, as well as other keys, first the inherit is resolved, but then the other keys in the dict override the inherited dict. This is made possible by the use of DeepChainMap:

>>> d = {'key': {
...         'inherit': str(inherited),
...         'baz': 'quizling',
...         'fred': 'barney',
...     }}
>>> config = Config(d, resolve_inherits=True)
>>> config['key']
Config({'baz': 'quizling', 'fred': 'barney', 'foo': 'bar'})

In the previous examples we used absolute filename paths with 'inherit', but it may also contain a relative path. If it contains a relative path there are two possibilities: If the parent config does not have a .filename, then relative paths are simply resolved relative to the current working directory. This is not terribly useful because it might resolve to a different file depending on what directory you’re currently working in. More useful is that when the parent file does have a .filename, then relative paths are considered relative to the directory containing the parent file.

For example, let’s put a parent and child file in the same directory:

>>> parent_filename = tmp / 'parent.json'
>>> child_filename = tmp / 'child.json'
>>> save_dict({'a': 1}, child_filename)
>>> save_dict({'foo': {'inherit': 'child.json'}, 'b': 2}, parent_filename)

As noted, both files are in the same directory:

>>> parent_filename.parent == child_filename.parent
True

So we could specify just {'inherit': 'child.json'}, meaning inherit from the file child.json in the same directory as me:

>>> parent = Config.from_file(parent_filename)
>>> parent
Config({'foo': {'a': 1}, 'b': 2})
>>> parent['foo']
Config({'a': 1})

This feature is particularly useful when there are multiple config files in a rigid directory structure, where one file is always going to be in the same position in the file hierarchy relative to the files it inherits from. So the relationship between the files is maintained even if the root of the directory structure is moved, e.g. between different machines.

copy(folded=False)[source]

New Config or subclass with a new copy of maps[0] and refs to maps[1:].

If folded=True, however, it returns a copy with all maps folded in so that there is only one map in the resulting copy; that is, it is equivalent to Config(chain_map.dict()).

Also copies the filename.

classmethod from_default(name, validate=True, schema=None, resolve_inherits=True, **kwargs)[source]

Load one of the default config files from dnadna.DEFAULTS_DIR.

The filename extension may be omitted, so that Config.from_default('simulation') is the same as Config.from_default('simulation.yml'); as such that directory should not contain any conflicting filenames.

Remaining keyword arguments are the same as those to Config, with the exception that the schema argument may only be a string, since the use of lru_cache means all arguments must be hashable.

By default, the default config file is validated against the schema of the same name. For example, the Config.from_default('dataset') validates against the 'dataset' schema if it exists.

Examples

>>> from dnadna.utils.config import Config
>>> Config.from_default('dataset', schema='dataset')
Config({'data_root': '.', 'dataset_name': 'generic', ...})
classmethod from_file(filename, validate=True, schema=None, resolve_inherits=True, **kwargs)[source]

Read the Config from a supported JSON-like file, currently either a JSON or YAML file.

Parameters:

filename (str or pathlib.Path) – The filename to read from; currently should have either a .json, .yml, or .yaml extension in order to determine the correctly determine the file format. Other formats implemented by additional subclasses of DictSerializer may by supported in the future.

Keyword Arguments:
  • validate (bool or dict) – (optional) – Same as the validate option to the standard Config constructor (default: True).

  • schema (str or dict) – (optional) – Same as the schema option to the standard Config constructor.

  • resolve_inherits (bool or dict) – (optional) – Same as the resolve_inherits option to the standard Config constructor.

  • **kwargs – Additional keyword arguments are passed to the underlying load_dict call.

Examples

>>> from dnadna.utils.config import Config, save_dict
>>> tmp = getfixture('tmp_path')  # pytest specific
>>> filename = tmp / 'config.json'
>>> save_dict({'a': 1}, filename)
>>> schema = {'properties': {'a': {'type': 'integer'}}}
>>> config = Config.from_file(filename, schema=schema)
>>> config['a']
1
>>> str(config.filename)
'...config.json'
>>> schema['properties']['a']['type'] = 'string'
>>> Config.from_file(filename, schema=schema)
Traceback (most recent call last):
...
dnadna.utils.config.ConfigError: error in ".../config.json" at 'a':
1 is not of type 'string'
property schemas

dict mapping the names of built-in schemas to their values.

Built-in schemas are loaded from any .json, .yml, or .yaml files in the directories listed in SCHEMA_DIRS.

Schemas in sub-directories of paths in SCHEMA_DIRS have their subdirectory path prepended to the name with /.

to_file(filename=None, **kwargs)[source]

Save the Config to the file given by filename.

If the Config was read from a file and has a non-empty .filename attribute, it will be written back to the same file by default.

Additional kwargs depend on the file format and are passed to the appropriate DictSerializer depending on the filename.

This is equivalent to calling save_dict with self.

unresolve_inherits(config_dir=None, only=None)[source]

A sort of inversion of Config.resolve_inherits.

This walks through all the chained mappings in this Config, and for any that have a non-empty .filename, it is removed from the chained mappings and replaced with an entry in an inherit property for the top-level mapping.

This returns a new Config with all the relevant replacements made.

Examples

>>> from dnadna.utils.config import save_dict
>>> tmp = getfixture('tmp_path')  # pytest specific
>>> inherited = tmp / 'inherited.json'
>>> save_dict({'foo': 'bar', 'baz': 'qux'}, inherited)
>>> c = Config({'key': {'inherit': str(inherited)}},
...            resolve_inherits=True)
...
>>> c
Config({'key': {'foo': 'bar', 'baz': 'qux'}})
>>> c2 = c.unresolve_inherits()
>>> c2
Config({'key': {'inherit': '...inherited.json'}})
validate(schema=None, **validator_kwargs)[source]

Ensure that the configuration is valid:

  • All keys should be strings (for JSON-compatibility).

  • If a JSON schema is given, validate the config against that schema. The schema may either be a full JSON Schema given as a dict, or a key into the Config.schemas registry.

exception dnadna.utils.config.ConfigError(config, msg, suffix='', path=())[source]

Bases: ValueError

class dnadna.utils.config.ConfigMixIn(config={}, validate=True)[source]

Bases: object

Mix-in for classes that accept a Config object to provide part of their attribute namespace. Makes top-level keys in the Config object accessible as attributes on instances of the class.

Includes optional validation of the Config against a schema by setting the config_schema class attribute.

The config_schema attribute may be either the name of a built-in schema, or a JSON Schema object (see Config.validate).

If config_default is provided, it provides default values for the config which can be overridden.

Examples

>>> from dnadna.utils.config import Config, ConfigMixIn
>>> class MyClass(ConfigMixIn):
...     config_schema = {
...         'properties': {'a': {'type': 'integer'}}
...     }
...
...     def __init__(self, config, foo=1, validate=True):
...         super().__init__(config, validate=validate)
...         self.foo = foo
...
>>> config = Config({'a': 1, 'b': 'b'})
>>> inst = MyClass(config)
>>> inst.a
1
>>> inst.b
'b'

Assignment to attributes that are keys in the Config also update the underlying Config. Such updates are ‘’not’’ validated against the schema:

>>> inst.b = 'c'
>>> inst.b
'c'
>>> inst.config['b']
'c'

Validation is performed upon instantiation unless passed validate=False:

>>> config = Config({'a': 'a', 'b': 'b'})
>>> inst = MyClass(config)
Traceback (most recent call last):
...
dnadna.utils.config.ConfigError: error in config at 'a': 'a' is not of
type 'integer'

Note, if validation is disabled, then there is no guarantee the object will work properly if the config is invalid.

config_attr = 'config'

The name of the attribute in which instances of this class store its Config. Typically this is just the .config attribute.

config_default = {}

Default value for the Config of instances of this class.

config_schema = None

The schema against which this class should validate its config Config by default.

May be either the name of one of the built-in schemas (see Config.schemas) or a full schema object.

classmethod from_config_file(filename, validate=True, **kwargs)[source]

Instantiate from a config file.

This method must be overridden if the subclass takes additional __init__ arguments besides config and validate.

validate_config(config)[source]

Validate the config file with which this class was initialized.

By default it validates the config file against the associated ConfigMixin.config_schema schema, but this method may be overridden to add additional semantic validation to the config file that is not possible through the schema alone.

class dnadna.utils.config.ConfigValidator(schema, *args, resolve_plugins=True, resolve_defaults=True, resolve_filenames=True, posixify_filenames=False, **kwargs)[source]

Bases: object

A custom validator wrapping jsonschema.Draft7Validator class which supports special validation functionality for DNADNA Config objects:

* Recognizes `Config` objects as JSON ``object`` s.
  • Adds new string formats:

    • filename: When a Config is loaded from a file, any values in the Config that are recognized by the specified JSON schema as representing a filename are automatically resolved to absolute paths relative to the config file’s location. If the filename is already an absolute filename it is left alone. If the config does not have an associated filename, relative paths are treated as relative to the current working directory.

    • filename!: Same as filename without the !, but a schema validation error is raised if the resulting filename does not exist on the filesystem.

    • python-module: The name of a Python module/package that should be importable via the standard import system (e.g. import dnadna). If an ImportError is raised when trying to import this module a schema validation error is raised.

  • If the schema specifies defaults for any properties, those default values are filled into the Config if it is otherwise missing values for those properties.

  • If the schema specifies an "errorMsg" property, custom error messages for validation errors can be provided and shown to users. See ConfigValidator.validate for examples.

Examples

>>> from dnadna.utils.config import ConfigValidator, Config
>>> schema = {
...     'type': 'object',
...     'properties': {
...         'abspath': {'type': 'string', 'format': 'filename'},
...         'relpath': {'type': 'string', 'format': 'filename'},
...         'nonpath': {'type': 'string'},
...         'has_default_1': {'type': 'string', 'default': 'a'},
...         'has_default_2': {'type': 'string', 'default': 'b'}
...     }
... }
>>> validator = ConfigValidator(schema, posixify_filenames=True)
>>> config = Config({
...     'abspath': '/bar/baz/qux',
...     'relpath': 'fred',
...     'nonpath': 'barney',
...     'has_default_2': 'c'  # override the default
... }, filename='/foo/bar/config.json')
>>> validator.validate(config) is None
True
>>> config
Config({'abspath': '/bar/baz/qux', 'relpath': '/foo/bar/fred',
    'nonpath': 'barney', 'has_default_2': 'c', 'has_default_1': 'a'})
best_match(errors, key=<staticmethod object>)[source]

Wraps the jsonschema.exceptions.best_match to return CustomValidatonError See the relevance_with_const_select documentation above

static relevance_with_const_select(error)[source]

This implements a custom heuristic for choose the best-match error with dnadna.utils.config.ConfigValidator..

It prioritizes CustomValidatonErrors over other errors, so that a schema with custom errorMsg properties can decide through that means which errors are most important. This can be especially useful when using errorMsg in a oneOf suite, where the custom error is perhaps more important than default reason given for why none of the sub-schemas matched. Here’s an example:

>>> schema = {
...     'oneOf': [{
...         'type': 'object',
...         'minProperties': 1,
...         'errorMsg': {
...             'minProperties': 'must have at least 1 entry'
...         }
...    }, {
...        'type': 'array',
...        'minItems': 1,
...        'errorMsg': {
...            'minItems': 'must have at least 1 entry'
...        }
...    }]
... }

This schema matches either an array or an object, which in either case must have a least one property (in the object case) or item (in the array case). Without this custom relevance function, best_match will just choose one of the errors from one of the oneOf schemas which caused it not to match. In this case it happens to select the type error from the first sub-schema:

>>> from jsonschema.exceptions import best_match
>>> from dnadna.utils.config import ConfigValidator
>>> validator = ConfigValidator(schema)
>>> errors = validator.iter_errors([])  # try an empty list
>>> best_match(errors)
<ValidationError: '[] should be non-empty'>

Using this custom error ranking algorithm, the CustomValidationError will be preferred:

>>> errors = validator.iter_errors([])  # try an empty list
>>> validator.best_match(errors,
...            key=ConfigValidator.relevance_with_const_select)
<CustomValidationError: 'must have at least 1 entry'>

Otherwise it’s the same as the default heuristic with extra support for a common pattern where oneOf combined with const or enum is used to select from a list of sub-schemas based on the value of a single property.

For example:

>>> schema = {
...     'required': ['type'],
...     'oneOf': [{
...         'properties': {
...             'type': {'const': 'regression'},
...         }
...     }, {
...         'properties': {
...             'type': {'const': 'classification'},
...             'classes': {'type': 'integer'},
...         },
...         'required': ['classes']
...     }]
... }
...

The first schema in the oneOf list will match if and only if the document contains {'type': 'regression'} and the second will match if and only if {'type': 'classification'} with no ambiguity.

In this case, when type matches a specific sub-schema, the more interesting error will be errors that occur within the sub-schema. But the default heuristics are such that it will think the type error is more interesting. For example:

>>> import jsonschema
>>> jsonschema.validate({'type': 'classification'}, schema)
Traceback (most recent call last):
...
jsonschema.exceptions.ValidationError: 'regression' was expected
...

Here the error that matched the heuristic happens to be the the one that caused the first sub-schema to be skipped over, because properties.type.const did not match. But actual reason an error was raised at all was because the second sub-schema didn’t match either due to the required 'classes' property being missing. Under this use case, that would be the more interesting error. This heuristic solves that. In order to demonstrate this, we have to call best_match directly, since jsonschema.validate doesn’t have an option to pass down a different heuristic key:

>>> from dnadna.utils.config import ConfigValidator
>>> validator = ConfigValidator(schema)
>>> errors = validator.iter_errors({'type': 'classification'})
>>> raise validator.best_match(errors,
...           key=ConfigValidator.relevance_with_const_select)
Traceback (most recent call last):
...
jsonschema.exceptions.ValidationError: 'classes' is a required property
...

This also supports a similar pattern (used by several plugins) where instead of const being used to select a specific sub-schema, enum is used with a unique list of values (in fact const is just a special case of enum with only one value). For example:

>>> schema = {
...     'required': ['name'],
...     'oneOf': [{
...         'properties': {
...             'name': {'enum': ['my-plugin', 'MyPlugin']},
...         }
...     }, {
...         'properties': {
...             'name': {'enum': ['my-plugin2', 'MyPlugin2']},
...             'x': {'type': 'integer'},
...         },
...         'required': ['x']
...     }]
... }
...
>>> validator = ConfigValidator(schema)
>>> errors = validator.iter_errors({'name': 'my-plugin2'})
>>> raise validator.best_match(errors,
...           key=ConfigValidator.relevance_with_const_select)
Traceback (most recent call last):
...
jsonschema.exceptions.ValidationError: 'x' is a required property
...
validate(config, *args, **kwargs)[source]

Validate the config against the schema and raise a ConfigError if validation fails.

This can be enhanced by an extension to JSON-Schema, the "errorMsg" property which can be added to schemas. All JSON-Schema validation errors have a default error message which, while technically correct, may not tell the full story to the user. For example:

>>> from dnadna.utils.config import ConfigValidator
>>> schema = {
...     'type': 'object',
...     'properties': {
...         'loss_weight': {
...             'type': 'number',
...             'minimum': 0,
...             'maximum': 1
...         }
...     }
... }
...
>>> validator = ConfigValidator(schema)
>>> validator.validate({'loss_weight': 2.0})
Traceback (most recent call last):
...
dnadna.utils.config.ConfigError: error in config at 'loss_weight':
2.0 is greater than the maximum of 1

However, if the schema has an "errorMsg" for "loss_weight" we can give a more descriptive error. The value of "errorMsg" may also include the following template variables:

  • {property} the name of the property being validated

  • {value} the value of the property being validated

  • {validator} the name of the validation being performed (e.g. 'minimum'

  • {validator_value} the value associated with the validator (e.g. 1 for "minimum": 1)

Let’s try adding a more descriptive error message for validation errors on "loss_weight":

>>> schema = {
...     'type': 'object',
...     'properties': {
...         'loss_weight': {
...             'type': 'number',
...             'minimum': 0,
...             'maximum': 1,
...             'errorMsg':
...                 '{property} must be a floating point value '
...                 'between 0.0 and 1.0 inclusive (got {value})'
...         }
...     }
... }
...
>>> validator = ConfigValidator(schema)
>>> validator.validate({'loss_weight': 2.0})
Traceback (most recent call last):
...
dnadna.utils.config.ConfigError: error in config at 'loss_weight':
loss_weight must be a floating point value between 0.0 and 1.0
inclusive (got 2.0)

Note

In the above example it would have just as easy to explicitly write loss_weight in the error message instead of the template variable {property}, but the latter case is more reusable (e.g. in definitions) and was used in this example just for illustration purposes

The "errorMsg" property may also be an object/dict, mapping the names of validators to error messages specific to a validator. If it contains the validator "default", the default message is used as a fallback for any other validators that do not have a specific error message. For example, the following schema requires an array of at least one unique string. It provides a custom error message for minItems, but not for the other properties:

>>> schema = {
...     'type': 'array',
...     'items': {'type': 'string'},
...     'minItems': 1,
...     'uniqueItems': True,
...     'errorMsg': {
...         'default':
...             'must be an array of at least 1 unique string',
...         'minItems':
...             'array was empty (it must have at least 1 item)'
...     }
... }
...
>>> validator = ConfigValidator(schema)
>>> validator.validate([1, 2])
Traceback (most recent call last):
...
dnadna.utils.config.ConfigError: error in config at '1': 2 is not of type 'string'
>>> validator.validate(['a', 'a'])
Traceback (most recent call last):
...
dnadna.utils.config.ConfigError: error in config: must be an array
of at least 1 unique string
>>> validator.validate([])
Traceback (most recent call last):
...
dnadna.utils.config.ConfigError: error in config: must be an array
of at least 1 unique string
>>> validator.validate(['a', 'b', 'c'])
class dnadna.utils.config.DeepChainMap(*maps, overrides={})[source]

Bases: ChainMap

Like collections.ChainMap, but also automatically applies chaining recursively to nested dictionaries.

For example, if two dictionaries in a DeepChainMap dc each contain the key 'c' holding a dictionary, then dc['c'] returns a DeepChainMap of those dictionaries. This follows the tree recursively until and unless the key c in one of the parent maps does not refer to a dictionary–this can have the effect of “clobbering” dicts higher up in the tree. It is also possible to prevent recursion at a specific key by providing overrides.

Parameters:

maps (list) – The sequence of mappings to chain together

Keyword Arguments:

overrides (list) – (optional) – List of tuples giving the path to a key whose value should be overridden entirely by the mapping before it in the maps sequence. This is only relevant when the value is a dict: Rather than merging the two dicts into a DeepChainMap, the first dict overrides the value of the second.

maps

The sequence of mappings that is walked when looking up keys in a DeepChainMap. The key is looked up first in .maps[0] and then so on until found, or until the sequence is exhausted.

Type:

list

overrides

List of paths into the mapping in the format (key, subkey, ...) providing which keys should be overridden by values earlier in the maps list (see examples).

Type:

list

Examples

>>> from dnadna.utils.config import DeepChainMap

Simple case; this is no different from a regular collections.ChainMap:

>>> d = DeepChainMap({'a': 1, 'b': 2}, {'b': 3, 'd': 4})
>>> dict(d)
{'a': 1, 'b': 2, 'd': 4}

But when some of the maps contain nested maps at the same key, those are now also chained. Compare with regular collections.ChainMap, in which the left-most dict under 'b' completely clobbers the dict in the right-hand 'b':

>>> from collections import ChainMap
>>> left = {'a': 1, 'b': {'c': 2, 'd': 3}}
>>> right = {'a': 2, 'b': {'c': 4, 'f': 5}, 'g': 6}
>>> c = ChainMap(left, right)
>>> dict(c)
{'a': 1, 'b': {'c': 2, 'd': 3}, 'g': 6}

With DeepChainMap the dicts under 'b' are chained as well. The DeepChainMap.dict method can be used to recursively convert all nested dicts to a plain dict:

>>> d = DeepChainMap(left, right)
>>> d.dict()
{'a': 1, 'b': {'c': 2, 'd': 3, 'f': 5}, 'g': 6}

As mentioned above, nested chaining only continues so long as the dict in the chain also contains a dict a the same key; a non-dict value can in a sense “interrupt” the chain:

>>> d = DeepChainMap({'a': {'b': 2}}, {'a': {'c': 3}}, {'a': 5},
...                  {'a': {'d': 4}})
>>> d.dict()
{'a': {'b': 2, 'c': 3}}

You can see that the right-most {'a': {'d': 4}} is ignored since just before it {'a': 5} does not have a dict at 'a'. However, if 'a' is missing at some point along the chain that is not a problem–the nested mapping continues to the next map in the chain:

>>> d = DeepChainMap({'a': {'b': 2}}, {'a': {'c': 3}}, {},
...                  {'a': {'d': 4}})
>>> d.dict()
{'a': {'b': 2, 'c': 3, 'd': 4}}

You can also “interrupt” the chaining for dict values by providing the overrides argument; this is an advanced usage. In the first case d['a']['b'] is merged from both dicts:

>>> d = DeepChainMap({'a': {'b': {'c': 2}}, 'w': 'w'},
...                  {'a': {'b': {'d': 3}}, 'x': 'x'})
>>> d.dict()
{'a': {'b': {'c': 2, 'd': 3}}, 'w': 'w', 'x': 'x'}

But by passing overrides=('a', 'b') merging stops short at d['a']:

>>> d = DeepChainMap({'a': {'b': {'c': 2}, 'w': 'w'}},
...                  {'a': {'b': {'d': 3}, 'x': 'x'}},
...                  overrides=[('a', 'b')])
>>> d.dict()
{'a': {'b': {'c': 2}, 'w': 'w', 'x': 'x'}}

Here you can see that the dicts keyed by ['a']['b'] were not merged, and only the first one was kept.

copy(folded=False)[source]

New DeepChainMap or subclass with a new copy of maps[0] and refs to maps[1:].

If folded=True, however, it returns a copy with all maps folded in so that there is only one map in the resulting copy; that is, it is equivalent to DeepChainMap(chain_map.dict()).

dict(cls=<class 'dict'>)[source]

Recursively convert self and all nested mappings to a plain dict, or the type specified by the cls argument.

get_owner(key, parent=False)[source]

Given a key, return the first nested map that contains that key.

Examples

>>> from dnadna.utils.config import DeepChainMap
>>> cm = DeepChainMap({'a': 1, 'b': 2}, {'b': 3, 'c': 4})
>>> cm.get_owner('b')
{'a': 1, 'b': 2}
>>> cm.get_owner('c')
{'b': 3, 'c': 4}

If parent=True, in the case of nested DeepChainMaps, it only returns the “inner-most” DeepChainMap containing the key. For example:

>>> inner = DeepChainMap({'c': 3}, {'d': 4})
>>> outer = DeepChainMap({'a': 1, 'b': 2}, inner)
>>> outer.get_owner('d')
{'d': 4}
>>> outer.get_owner('d', parent=True)
DeepChainMap({'c': 3}, {'d': 4})
items() a set-like object providing a view on D's items[source]
keys() a set-like object providing a view on D's keys[source]
values() an object providing a view on D's values[source]
dnadna.utils.config.load_dict(filename, **kwargs)[source]

Loads a nested JSON-like data structure from a given filename.

May support multiple serialization formats, determined primarily by the filename extension. Currently supports:

  • JSON (.json)

  • YAML (.yml or .yaml)

dnadna.utils.config.load_dict_from_json(filepath)[source]

Load a JSON file as a dict.

Shortcut for load.

Parameters:

filepath (str) – filepath to the json file

dnadna.utils.config.save_dict(obj, filename, **kwargs)[source]

Serializes a nested JSON-like data structure to a given filename.

The serialization format is determined by the filename.

May support multiple serialization formats, determined primarily by the filename extension. Currently supports:

  • JSON (.json)

  • YAML (.yml or .yaml)

dnadna.utils.config.save_dict_annotated(obj, filename, schema=None, validate=False, serializer=<class 'dnadna.utils.serializers.YAMLSerializer'>, **kwargs)[source]

Serializes a (possibly nested) dict to YAML, after (optionally) validating it against the given schema, and producing comments from the title/description keywords in the schema.

Parameters:
  • obj (dict, Config) – The dict-like object to save.

  • filename (str, pathlib.Path, file-like) – A filename or pathlib.Path, or open file-like object to which to stream the output.

Keyword Arguments:
  • schema (str or dict) – (optional) – A schema given either as the name of a schema in the schema registry, or a full schema object given as a dict. If omitted, this is equivalent to save_dict to a YAML file, and no annotation is added.

  • validate (bool) – (optional) – Validate the given object against the schema before writing it (default: False). This can be used in case the object is not already known to be valid against the schema.

  • serializer (DictSerializer) – (optional) – Specify the DictSerializer to use; normally this should be the YAMLSerializer since it’s the only one (currently) which supports comments.

Examples

>>> from io import StringIO
>>> from dnadna.utils.config import save_dict_annotated
>>> schema = {
...     'description': 'file description',
...     'properties': {
...         'a': {'type': 'string', 'title': 'a',
...               'description': 'a description'},
...         'b': {'type': 'integer', 'description': 'b description'},
...         'c': {
...             'type': 'object',
...             'description': 'c description',
...             'properties': {
...                 'd': {'description': 'd description'},
...                 'e': {'description': 'e description'}
...             }
...         },
...         'f': {'description': 'f description'}
...     }
... }
...
>>> d = {'a': 'foo', 'b': 2, 'c': {'d': 4, 'e': 5}, 'f': 6}
>>> out = StringIO()
>>> save_dict_annotated(d, out, schema=schema, validate=True, indent=4)
>>> print(out.getvalue())
# file description
# a
#
# a description
a: foo
# b description
b: 2
# c description
c:
    # d description
    d: 4
    # e description
    e: 5
# f description
f: 6
dnadna.utils.config.save_dict_in_json(filepath, params)[source]

Save a dictionary into a json file.

Parameters:
  • filepath (str) – filepath of the json file

  • params (dict) – dictionary containing the overall parameters used for the simulation (e.g. path to the data folder, number of epoch…)